Hands-on Exercise 3

Author

Emma Chew En Chin

Published

September 3, 2025

Modified

September 4, 2025

1 Overview

Spatial Point Pattern Analysis (SPPA) - The pattern or distribution of a set of points on a surface.

First-Order SPPA (1st-SPPA) - Intensity/density of points across a study area.

Launching five R packages below:

  • spatstat : for point pattern analysis
  • terra : modern spatial data analysis package, used to convert image output generate by spatstat into terra format
  • rvest : for scraping of data from web pages
pacman::p_load(sf, terra, spatstat, 
               tmap, rvest, tidyverse)

2 Importing and Wrangling Datasets

Import the MasterPlan2019 data. st_zm() is used to remove Z (elevation) and M (measure) dimensions from geospatial geometries.

mpsz_sf <- st_read("data/geospatial/MasterPlan2019SubzoneBoundaryNoSeaKML.kml") %>% 
  st_zm(drop = TRUE, what = "ZM") %>% st_transform(crs = 3414)
Reading layer `URA_MP19_SUBZONE_NO_SEA_PL' from data source 
  `C:\emmachew\ISSS626-emma-chew-en-chin\Hands-on Exercise\Hands-on_Ex03\data\geospatial\MasterPlan2019SubzoneBoundaryNoSeaKML.kml' 
  using driver `KML'
Simple feature collection with 332 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS:  WGS 84

Build a function to extract attributes REGION_N, PLN_AREA_N, SUBZONE_N, SUBZONE_C from Description field.

extract_kml_field <- function(html_text, field_name) {
  if (is.na(html_text) || html_text == "") return(NA_character_)
  
  page <- read_html(html_text)
  rows <- page %>% html_elements("tr")
  
  value <- rows %>%
    keep(~ html_text2(html_element(.x, "th")) == field_name) %>%
    html_element("td") %>%
    html_text2()
  
  if (length(value) == 0) NA_character_ else value
}
mpsz_sf <- mpsz_sf %>%
  mutate(
    REGION_N = map_chr(Description, extract_kml_field, "REGION_N"),
    PLN_AREA_N = map_chr(Description, extract_kml_field, "PLN_AREA_N"),
    SUBZONE_N = map_chr(Description, extract_kml_field, "SUBZONE_N"),
    SUBZONE_C = map_chr(Description, extract_kml_field, "SUBZONE_C")
  ) %>%
  select(-Name, -Description) %>%
  relocate(geometry, .after = last_col())
mpsz_cl <- mpsz_sf %>%
  filter(SUBZONE_N != "SOUTHERN GROUP",
         PLN_AREA_N != "WESTERN ISLANDS",
         PLN_AREA_N != "NORTH-EASTERN ISLANDS")
write_rds(mpsz_cl, 
          "data/mpsz_cl.rds")

Import the ChildCareService data.

childcare_sf <- st_read("data/geospatial/ChildCareServices.kml") %>% 
  st_zm(drop = TRUE, what = "ZM") %>%
  st_transform(crs = 3414)
Reading layer `CHILDCARE' from data source 
  `C:\emmachew\ISSS626-emma-chew-en-chin\Hands-on Exercise\Hands-on_Ex03\data\geospatial\ChildCareServices.kml' 
  using driver `KML'
Simple feature collection with 1925 features and 2 fields
Geometry type: POINT
Dimension:     XYZ
Bounding box:  xmin: 103.6878 ymin: 1.247759 xmax: 103.9897 ymax: 1.462134
z_range:       zmin: 0 zmax: 0
Geodetic CRS:  WGS 84

Plotting a static map of childcare_sf.

plot(st_geometry(mpsz_sf), col = "grey")
plot(st_geometry(childcare_sf), add = TRUE, pch = 19, col = "black", cex = 0.6)

Plotting an interactive map of childcare_sf.

tmap_mode('view')
tm_shape(childcare_sf)+
  tm_dots()
tmap_mode('plot')

3 Geospatial Data Wrangling

Convert sf (Simple Features) objects into spatstat ppp and owin objects.

3.1 Conversting sf dataframes to ppp class

Using spatstat to convert point event data in **ppp* object form.

Next, use class() to verify the object class.

childcare_ppp <- as.ppp(childcare_sf)
class(childcare_ppp)
[1] "ppp"
summary(childcare_ppp)
Marked planar point pattern:  1925 points
Average intensity 2.417323e-06 points per square unit

Coordinates are given to 11 decimal places

Mark variables: Name, Description
Summary:
     Name           Description       
 Length:1925        Length:1925       
 Class :character   Class :character  
 Mode  :character   Mode  :character  

Window: rectangle = [11810.03, 45404.24] x [25596.33, 49300.88] units
                    (33590 x 23700 units)
Window area = 796335000 square units

3.2 Creating owin object

owin object is used to represent a polygonal region (e.g. Singapore boundary)

Using as.owin() to convert mpsz_sf into owin object.

Next, use class() to verify the object class.

sg_owin <- as.owin(mpsz_cl)
class(sg_owin)
[1] "owin"
plot(sg_owin)

3.3 Combining point events object and owin object

Combine both point and polygon feature into one ppp object class.

childcareSG_ppp = childcare_ppp[sg_owin]
childcareSG_ppp
Marked planar point pattern: 1925 points
Mark variables: Name, Description 
window: polygonal boundary
enclosing rectangle: [2667.54, 55941.94] x [21448.47, 50256.33] units

4 Clark-Evan Test for Nearest Neighbour Analysis (NNA)

Nearest Neighbor Analysis (NNA) is a spatial statistics method that calculates the average distance between each point and its closest neighbor to determine if a pattern of points is clustered, dispersed, or randomly distributed.

Clark-Evans test of aggregation is performed using clarkevans.test() of the spatstat.explore package.

The test hypotheses are:

Ho = The distribution of childcare services are randomly distributed.

H1 = The distribution of childcare services are not randomly distributed.

The 95% confident interval will be used.

4.1. Performing the Clark-Evans test without CSR

clarkevans.test(childcareSG_ppp,
                correction="none",
                clipregion="sg_owin",
                alternative=c("clustered"))

    Clark-Evans test
    No edge correction
    Z-test

data:  childcareSG_ppp
R = 0.53532, p-value < 2.2e-16
alternative hypothesis: clustered (R < 1)

4.2. Performing the Clark-Evans test with CSR

Argument method = “MonteCarlo” is used. The p-value for the test is computed by comparing the observed value of R to the results obtained from nsim (i.e. 39, 99, 999) simulated realisations of Complete Spatial Randomness (CSR) conditional on the observed number of points.

clarkevans.test(childcareSG_ppp,
                correction="none",
                clipregion="sg_owin",
                alternative=c("clustered"),
                method="MonteCarlo",
                nsim=99)

    Clark-Evans test
    No edge correction
    Monte Carlo test based on 99 simulations of CSR with fixed n

data:  childcareSG_ppp
R = 0.53532, p-value = 0.01
alternative hypothesis: clustered (R < 1)

From the results, we can conclude at a 95% statistical significance level that we can reject the null hypothesis that the distribution of childcare services are randomly distributed.

This shows that there is a pattern to which childcare services are distributed.

5 Kernel Density Estimation Method (KDE)

Kernel Density Estimation (KDE) visualises and analyses first-order spatial point patterns.

It transforms discrete point data into continuous density surfaces that reveal clusters and variations in event occurrences, without making prior assumptions about data distribution.

Use cases:

  • Data distribution understanding
  • Identification of hotspots
  • Explore relationships between spatial variables

5.1 Working with automatic bandwidth selection method

We can compute a kernel density using different bandwidth selection methods:

  • bw.diggle() : Adaptive bandwidth suitable for clustered patterns
  • bw.CvL() : Cross-validation based method
  • bw.scott() : Rule-of-thumb bandwidth (good for normal distributions)
  • bw.ppl() : Pilot bandwidth selection

Each method can be chosen based on the data characteristics and level of smoothing to be achieved.

Below, we use bw.diggle(), and use the smoothing kernel “gaussian”.

kde_SG_diggle <- density(
  childcareSG_ppp,
  sigma=bw.diggle,
  edge=TRUE,
  kernel="gaussian") 
plot(kde_SG_diggle)

Note: The density values of the output range from 0 to 0.00003727443 which is way too small to comprehend. This is because the default unit of measurement of svy21 is in meter. As a result, the density values computed is in “number of points per square meter”.

summary(kde_SG_diggle)
real-valued pixel image
128 x 128 pixel array (ny, nx)
enclosing rectangle: [2667.538, 55941.94] x [21448.47, 50256.33] units
dimensions of each pixel: 416 x 225.0614 units
Image is defined on a subset of the rectangular grid
Subset area = 669941961.12249 square units
Subset area fraction = 0.437
Pixel values (inside window):
    range = [-6.584123e-21, 3.063698e-05]
    integral = 1927.788
    mean = 2.877545e-06

We can retrieve the bandwidth used to compute the kde layer as below.

bw <- bw.diggle(childcareSG_ppp)
bw
   sigma 
295.9712 

5.2 Rescaling KDE values

rescale.ppp() is used to convert the unit of measurement from meter to kilometer.

childcareSG_ppp_km <- rescale.ppp(
  childcareSG_ppp, 1000, "km")

Now, we re-run density() and plot the output kde map. Output image remains the same as the earlier output, but only the data values on the legend have changed.

kde_childcareSG_km <- density(childcareSG_ppp_km,
                              sigma=bw.diggle,
                              edge=TRUE,
                              kernel="gaussian")
plot(kde_childcareSG_km)

5.3 Working with different automatic bandwidth methods

Trying the other functions used to determine the bandwidth.

bw.CvL(childcareSG_ppp_km)
   sigma 
4.357209 
bw.scott(childcareSG_ppp_km)
 sigma.x  sigma.y 
2.159749 1.396455 
bw.ppl(childcareSG_ppp_km)
   sigma 
0.378997 
bw.diggle(childcareSG_ppp_km)
    sigma 
0.2959712 

bw.ppl() can be used to produce more appropriate values when the pattern consists predominantly of tight clusters.

bw.diggle() can be used to study and detect a single tight cluster in the midst of random noise.

kde_childcareSG.ppl <- density(childcareSG_ppp_km, 
                               sigma=bw.ppl, 
                               edge=TRUE,
                               kernel="gaussian")
par(mfrow=c(1,2))
plot(kde_childcareSG_km, main = "bw.diggle")
plot(kde_childcareSG.ppl, main = "bw.ppl")

5.4 Working with different kernel methods

By default, the kernel method used in density.ppp() is gaussian.

There are 3 other options:

  • Epanechnikov
  • Quartic
  • Dics
par(mfrow=c(2,2), mar=c(1,2,2,2))
plot(density(childcareSG_ppp_km, 
             sigma=0.2959712, 
             edge=TRUE, 
             kernel="gaussian"), 
     main="Gaussian")
plot(density(childcareSG_ppp_km, 
             sigma=0.2959712, 
             edge=TRUE, 
             kernel="epanechnikov"), 
     main="Epanechnikov")
plot(density(childcareSG_ppp_km, 
             sigma=0.2959712, 
             edge=TRUE, 
             kernel="quartic"), 
     main="Quartic")
plot(density(childcareSG_ppp_km, 
             sigma=0.2959712, 
             edge=TRUE, 
             kernel="disc"), 
     main="Disc")

6 Fixed and Adaptive KDE

6.1 Computing KDE by using fixed bandwidth

A KDE layer is computed by defining a bandwidth of 600 meter.

The sigma value used is 0.6, because the unit of measurement of childcareSG_ppp_km object is in kilometer, hence the 600m is 0.6km.

kde_childcareSG_fb <- density(childcareSG_ppp_km,
                              sigma=0.6, 
                              edge=TRUE,
                              kernel="gaussian")
plot(kde_childcareSG_fb)

6.2 Computing KDE by using adaptive bandwidth

Fixed bandwidth method is very sensitive to highly skew distribution of spatial point patterns over geographical units for example urban versus rural. One way to overcome this problem is by using adaptive bandwidth instead.

kde_childcareSG_ab <- adaptive.density(
  childcareSG_ppp_km, 
  method="kernel")
plot(kde_childcareSG_ab)

Comparing the fixed and adaptive kernel density estimation output below.

par(mfrow=c(1,2), mar=c(2,2,2,1))
plot(kde_childcareSG_fb, main = "Fixed bandwidth")
plot(kde_childcareSG_ab, main = "Adaptive bandwidth")

7 Plotting cartographic quality KDE map

7.1 Converting gridded output into raster

Converting the kernel density objects into SpatRaster objects, using rast() of the terra package.

kde_childcareSG_bw_terra <- rast(kde_childcareSG_km)

Using class() to verify if kde_childcareSG_bw_terra data belongs to the SpatRaster class.

class(kde_childcareSG_bw_terra)
[1] "SpatRaster"
attr(,"package")
[1] "terra"

The crs property is empty.

kde_childcareSG_bw_terra
class       : SpatRaster 
size        : 128, 128, 1  (nrow, ncol, nlyr)
resolution  : 0.4162063, 0.2250614  (x, y)
extent      : 2.667538, 55.94194, 21.44847, 50.25633  (xmin, xmax, ymin, ymax)
coord. ref. :  
source(s)   : memory
name        :         lyr.1 
min value   : -5.824417e-15 
max value   :  3.063698e+01 
unit        :            km 

7.2 Assigning projection systems

We have to assign the CRS information back on kde_childcareSG_bw_terra layer.

crs(kde_childcareSG_bw_terra) <- "EPSG:3414"

How, the coordinates reference is in SVY21.

kde_childcareSG_bw_terra
class       : SpatRaster 
size        : 128, 128, 1  (nrow, ncol, nlyr)
resolution  : 0.4162063, 0.2250614  (x, y)
extent      : 2.667538, 55.94194, 21.44847, 50.25633  (xmin, xmax, ymin, ymax)
coord. ref. : SVY21 / Singapore TM (EPSG:3414) 
source(s)   : memory
name        :         lyr.1 
min value   : -5.824417e-15 
max value   :  3.063698e+01 
unit        :            km 

7.3 Plotting KDE map with tmap

Displaying the raster in cartographic quality map using the tmap package.

Raster values are encoded explicitly onto the raster pixel using the values in “layer.1” field.

tm_shape(kde_childcareSG_bw_terra) + 
  tm_raster(col.scale = 
              tm_scale_continuous(
                values = "viridis"),
            col.legend = tm_legend(
            title = "Density values",
            title.size = 0.7,
            text.size = 0.7,
            bg.color = "white",
            bg.alpha = 0.7,
            position = tm_pos_in(
              "right", "bottom"),
            frame = TRUE)) +
  tm_graticules(labels.size = 0.7) +
  tm_compass() +
  tm_layout(scale = 1.0)

8 First Order SPPA at the Planning Subzone Level

We will analyse at the planning area level.

8.1 Geospatial data wrangling

8.1.1 Extracting study area

Extract target planning areas.

pg <- mpsz_cl %>%
  filter(PLN_AREA_N == "PUNGGOL")
tm <- mpsz_cl %>%
  filter(PLN_AREA_N == "TAMPINES")
ck <- mpsz_cl %>%
  filter(PLN_AREA_N == "CHOA CHU KANG")
jw <- mpsz_cl %>%
  filter(PLN_AREA_N == "JURONG WEST")

Review extracted areas.

par(mfrow=c(2,2), mar=c(2,2,2,1))
plot(st_geometry(pg), main = "Ponggol")
plot(st_geometry(tm), main = "Tampines")
plot(st_geometry(ck), main = "Choa Chu Kang")
plot(st_geometry(jw), main = "Jurong West")

8.1.2 Create owin object

Convert sf objects into owin.

pg_owin = as.owin(pg)
tm_owin = as.owin(tm)
ck_owin = as.owin(ck)
jw_owin = as.owin(jw)

8.1.3 Combining point events object and owin object

childcare_pg_ppp = childcare_ppp[pg_owin]
childcare_tm_ppp = childcare_ppp[tm_owin]
childcare_ck_ppp = childcare_ppp[ck_owin]
childcare_jw_ppp = childcare_ppp[jw_owin]

Use rescale.ppp() to transform the unit of measurement from metre to kilometre.

childcare_pg_ppp.km = rescale.ppp(childcare_pg_ppp, 1000, "km")
childcare_tm_ppp.km = rescale.ppp(childcare_tm_ppp, 1000, "km")
childcare_ck_ppp.km = rescale.ppp(childcare_ck_ppp, 1000, "km")
childcare_jw_ppp.km = rescale.ppp(childcare_jw_ppp, 1000, "km")

Plotting the 4 study areas and the locations of the childcare centres.

par(mfrow=c(2,2), mar=c(2,2,2,1))
plot(unmark(childcare_pg_ppp.km), 
  main="Punggol")
plot(unmark(childcare_tm_ppp.km), 
  main="Tampines")
plot(unmark(childcare_ck_ppp.km), 
  main="Choa Chu Kang")
plot(unmark(childcare_jw_ppp.km), 
  main="Jurong West")

8.2 Conduct Clark and Evans Test

8.2.1 Choa Chu Kang planning area

clarkevans.test(childcare_ck_ppp,
                correction="none",
                clipregion=NULL,
                alternative=c("two.sided"),
                nsim=999)

    Clark-Evans test
    No edge correction
    Z-test

data:  childcare_ck_ppp
R = 0.84097, p-value = 0.008866
alternative hypothesis: two-sided

8.2.2 Tampines planning area

clarkevans.test(childcare_tm_ppp,
                correction="none",
                clipregion=NULL,
                alternative=c("two.sided"),
                nsim=999)

    Clark-Evans test
    No edge correction
    Z-test

data:  childcare_tm_ppp
R = 0.66817, p-value = 6.58e-12
alternative hypothesis: two-sided

8.3 Computing KDE surfaces by planning area

Computing the KDE of each planning area, using bw.diggle().

par(mfrow=c(2,2), mar=c(2,2,2,1))
plot(density(childcare_pg_ppp.km, 
             sigma=bw.diggle, 
             edge=TRUE, 
             kernel="gaussian"),
     main="Punggol")
plot(density(childcare_tm_ppp.km, 
             sigma=bw.diggle, 
             edge=TRUE, 
             kernel="gaussian"),
     main="Tempines")
plot(density(childcare_ck_ppp.km, 
             sigma=bw.diggle, 
             edge=TRUE, 
             kernel="gaussian"),
     main="Choa Chu Kang")
plot(density(childcare_jw_ppp.km, 
             sigma=bw.diggle, 
             edge=TRUE, 
             kernel="gaussian"),
     main="Jurong West")